# Global Average Pooling

Vit So400m Patch16 Siglip Gap 384.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, utilizing global average pooling, with the attention pooling head removed, suitable for image feature extraction tasks.
Image Classification Transformers
V
timm
19
0
Vit So400m Patch16 Siglip Gap 256.v2 Webli
Apache-2.0
ViT image encoder based on SigLIP 2, using global average pooling, with attention pooling head removed, suitable for image feature extraction tasks.
Text-to-Image Transformers
V
timm
22
0
Vit So400m Patch14 Siglip Gap 378.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2 architecture, pre-trained on WebLI dataset, with attention pooling head removed and global average pooling applied
Image Classification Transformers
V
timm
20
0
Vit So400m Patch14 Siglip Gap 224.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, employing global average pooling with the attention pooling head removed, suitable for image feature extraction tasks.
Image Classification Transformers
V
timm
179
0
Vit Large Patch16 Siglip Gap 512.v2 Webli
Apache-2.0
A vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction, using Global Average Pooling (GAP) instead of attention pooling head
Image Classification Transformers
V
timm
29
0
Vit Large Patch16 Siglip Gap 384.v2 Webli
Apache-2.0
A vision Transformer model based on the SigLIP 2 architecture, featuring a Global Average Pooling (GAP) variant that removes the attention pooling head, suitable for image feature extraction tasks.
Text-to-Image Transformers
V
timm
95
0
Vit Giantopt Patch16 Siglip Gap 384.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, utilizing global average pooling and removing the attention pooling head, suitable for image feature extraction tasks.
Image Classification Transformers
V
timm
21
0
Vit Giantopt Patch16 Siglip Gap 256.v2 Webli
Apache-2.0
SigLIP 2 ViT image encoder, using global average pooling, with attention pooling head removed, designed specifically for timm
Image Classification Transformers
V
timm
17
0
Vit Base Patch32 Siglip Gap 256.v2 Webli
Apache-2.0
A vision Transformer model based on SigLIP 2, using Global Average Pooling (GAP) instead of attention pooling head for image encoding
Text-to-Image Transformers
V
timm
25
1
Vit Base Patch16 Siglip Gap 512.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, using global average pooling with the attention pooling head removed, suitable for image feature extraction tasks.
Image Classification Transformers
V
timm
105
0
Vit Base Patch16 Siglip Gap 384.v2 Webli
Apache-2.0
ViT image encoder based on SigLIP 2, using Global Average Pooling (GAP) instead of attention pooling head, suitable for image feature extraction tasks.
Image Classification Transformers
V
timm
105
0
Vit Base Patch16 Siglip Gap 256.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, employing global average pooling with the attention pooling head removed, suitable for image feature extraction.
Multimodal Fusion Transformers
V
timm
114
1
Vit Base Patch16 Siglip Gap 224.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2, utilizing global average pooling for image features
Image Classification Transformers
V
timm
303
0
Vit So400m Patch16 Siglip Gap 512.v2 Webli
Apache-2.0
A ViT image encoder based on SigLIP 2, utilizing global average pooling, suitable for vision-language tasks.
Text-to-Image Transformers
V
timm
21
0
Vit So400m Patch14 Siglip Gap 896.pali Pt
Apache-2.0
Vision model based on SigLIP image encoder, employing global average pooling, part of the PaliGemma project
Text-to-Image Transformers
V
timm
15
1
Vit So400m Patch14 Siglip Gap 896.pali2 3b Pt
Apache-2.0
A vision model based on the SigLIP image encoder, employing global average pooling, and part of the PaliGemma2 project
Text-to-Image Transformers
V
timm
14
1
Vit So400m Patch14 Siglip Gap 448.pali Mix
Apache-2.0
A vision-language model based on the SigLIP image encoder, utilizing global average pooling, suitable for multimodal tasks.
Text-to-Image Transformers
V
timm
15
0
Vit Large Patch16 Siglip Gap 384.webli
Apache-2.0
A vision Transformer model based on SigLIP, utilizing global average pooling, suitable for image feature extraction tasks.
Image Classification Transformers
V
timm
13
0
Vit Base Patch16 Siglip Gap 224.webli
Apache-2.0
Vision Transformer model based on SigLIP, containing only the image encoder part, employing a global average pooling strategy
Image Classification Transformers
V
timm
178
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase